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, Abstract. We give a new construction of formulas in Hanf normal form that are equivalent to first- 

CNj ' order formulas over structures of bounded degree. This is the first algorithm whose running time is 

shown to be elementary. The triply exponential upper bound is complemented by a matching lower 
bound. 
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1 Introduction 



Various syntactical normal forms for semantical properties of structures are known. For ex- 
ample, every first-order definable property that is preserved under extensions of structures 
O I is definable by an existential first-order sentence (Los-Tarski [23,20]). Gaifman's normal 
form is another example that formalizes the observation that first-order logic can only ex- 
^ ■ press local properties [10]. A third example in this line is Hanf's theorem, giving another 
. formalization of locality of first-order logic (at least for structures of bounded degree) [12, 

^: 6]. 

Gaifman's and Hanf's theorems have found applications in finite model theory and in 
particular in parametrized complexity. Namely, they lead to efficient parametrized algo- 
O ■ rithms deciding whether a formula holds in a (finite) structure [22, 17, 8, 9, 14, 3, 19, 15, 16] 
and even for more general algorithms that list all the satisfying assignments [5, 13]. Hanf's 
theorem was also used in the transformation of logical formulas into different automata 
models [24,11,2,1]. 

' In [4], it was shown that passing from arbitrary formulas to those in Los-Tarski or 

■ Gaifman normal form leads to a non-elementary blowup. The same paper also proves that 
for structures of bounded degree, the blowup for Gaifman's normal form is between 2- 
and 4-fold exponential, and that for Los-Tarski normal forms (for a restricted class of 
structures) is between 2- and 5-fold exponential. 

This paper shows that Hanf's normal form can be computed in three-fold exponential 
time and that this is optimal since there is a necessary blowup of three exponentials when 
passing from general first-order formulas to their Hanf normal form. We remark (as already 
observed by Seese [22]) that the first construction of Hanf normal forms [6] is not effective 
since satisfiability of first-order formulas in graphs of bounded degree is undecidable, also 
when we restrict to finite structures [25]. Only Seese [22] gave a small additional argu- 
ment showing that Hanf normal forms can indeed be computed. But his algorithm is not 
primitive recursive. This was improved later to a primitive-recursive algorithm by Durand 
and Grandjean [5] and (independently) by Lindell [19]. Their papers do not give an upper 
bound for the construction of Hanf normal forms, but on the face of it, the algorithm 



seems not to be elementary.'^ Their algorithm is a quantifier-alternation procedure that 
only works if the signature consists of finitely many injective functions (following Seese, 
one can bi-interprete every structure of bounded degree in such a structure, so this is no 
real restriction of the algorithm). Differently, our algorithm follows the original proof of 
Hanf 's theorem very closely by examining spheres of bounded diameter, but avoiding the 
detour via Ehrenfeucht-Fraisse-games. 

Acknowledgement We would like to thank Luc Segoufin and Arnaud Durand for comments 
on an earlier version and for hints to the literature that improved this paper. 

2 Definitions and background 

Throughout this paper, let L be a finite relational signature and let Lm denote the extension 
of L by the constants ci, C2, . . . , Cm- Let A be an L^-structure. We write a & A when we 
mean that a is an element of the universe of A. Furthermore, a denotes a tuple (oi, . . . , a„) 
of length n of elements of some structure A and x is the list of variables {xi, . . . ,Xn)- 
In both cases, n will be determined by the context. Finally, we define a distance (from 
N U {oo}) on the universe of A setting dist'^(a, b) = iS a — b and dist'^(a, c) = d + 1 
if there exists b e A with dist'^(a, 6) < d, there is some tuple in some of the relations 
of A that contains both, b and c, and there is no such b G A with dist'^(a, 6) < d, and 
dist"^(a, ft) = oo if dist'^(a, b) ^ d for all d G N. Next, the degree of a G ^ is the number 
of elements b G A with dist"^(a, 6) = 1, the degree of A is the supremum of the degrees of 
ae A. 

Let A be an L-structure, a — (oi, . . . , a„) G A, and d > 0. Then Bf{a) is the set of 
elements b E A with dist'^(ai,&) < d for some 1 < i < n. The d-sphere around a is the 
L„-structure 

S^{d) = {A\B^{a)ra). 

A d-sphere (with n centers) is an L„-structure [A^a) with B'^{a) = A. The L„-structure 
{A, a) is a sphere if there exists d > such that {A, d) is a d-sphere; the least such d is 
denoted (i(r) and is the radius of {A, a). The d-sphere r is realised by a in A if 

r - S^{d) . 

If two L-structures A and B satisfy exactly the same first-order sentences, then we 
write A = B.lf they only satisfy the same sentences of quantifier rank < r, then A =r B. 
Provided the degrees of A and B are finite, both these concepts can be characterized using 
the number of realisations of spheres. 

Theorem 2.1 (Hanf [12]). For any L-structures A and B, we have A = B whenever 

any sphere in A or B is finite and any sphere is realised in A and B the same number of 
times or > times. 

^ In the meantime, A. Durand has informed us of ongoing work aiming at an elementary upper bound for their 
algorithm. 
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This result was sharpened by Fagin, Stockmeyer & Vardi (see also Ebbinghaus & Flum 
[6]) to characterize the relation =^1 

Theorem 2.2 (Fagin et al. [7]). For all r, / G N there exist d, m G N (where d depends 
on r, only) such that for any L-structures A and B of degree < f , we have A =r B 
whenever any d-sphere with one center is realised in A and B the same number of times 
or >m times. 

Proof (of both theorems). The proof proceeds by showing that the respective counting 
property implies that duplicator has a winning strategy in the Ehrenfeucht-Frai'sse-game 
[6, 18]. This then implies the respective equivalence of A and B. □ 

This theorem has (at least) three different applications: The first apphcation (and 
its original motivation in [7]) is a technique to prove that certain properties ^ are not 
expressible in first-order logic: One provides two lists of structures Ar and B,- where Ar 
has the desired property and Br docs not. Furthermore, for any r, Ar and Br satisfy the 
counting condition from Theorem 2.2 with d and m determined by r and the degree / of 
Ar and Br- This implies Ar =r Br and therefore the property ^ cannot be expressed by 
a first-order sentence of quantifier depth r. Since this holds for all r, the property is not 
first-order expressible. The simplest such property is connectivity of a graph where Ar can 
be chosen a circle of size max(m, 2d) and Br a disjoint union of two copies of Ar {m and 
d are the constants from Theorem 2.2 for / = 2). 

The second application is an efficient evaluation of first-order properties on finite struc- 
tures of bounded degree [22, 9, 5]: The idea is to count the number of reahsations of spheres 
up to the threshold m and, depending on the vector obtained that way, decide whether the 
formula holds or not (we will come back to this aspect later in this section). 

The third application is a normal form for first-order sentences [6] . For a finite d-sphere r 
with n centers, let sph^(x) denote a formula such that {A, a) \= sp\ iff S;^{a) = r. A Hanf 
sentence asserts that there are at least m realisations of the finite sphere r with one center. 
Formally, it has the form 

3xi, X2,...,Xm- /\ 7^ Xj- A Vx : I I Y X = Xj I ^ sph^(x) | (1) 

l<i<j<m \ \l<i<m / / 

which we abbreviate as 

3^"*x : sph^(x) . 

A sentence is in Hanf normal form if it is a Boolean combination of Hanf sentences. 

Let and -0 be two formulas with free variables in xi, . . . , x„. To simplify notation, we 
will say that (p and ip are f -equivalent if, for all structures A of degree < /, we have 

A ^ VxiVx2 . . . Vx„ : {(fi ip) . 

CoroUeiry 2.3 (Ebbinghaus &; Flum [6]). For every sentence ip and all / G N, there 
exists an f -equivalent sentence ip in Hanf normal form. 
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Proof. Let r be the quantifier rank of ip and let d and m denote tlie numbers from Tlie- 
orem 2.2. Tlien tliere are only finitely many (i-spheres of degree < / with one center; let 
(ti, . . . ,r„) be the hst of these spheres. Now we associate with every structure A of de- 
gree < / a tuple t"^ e {0, 1, . . . , m}" as follows: For 1 < ^ < n. let tf- denote the minimum 
of m and the number of a G ^ with S'^{a) = r^. Note that there are only finitely many 
tuples t'^. Now -0 is a disjunction. It has one disjunct for every t G {0, 1, . . . , m}" for which 
there exists a structure A of degree < / with A \= ip and t = t-^. This disjunct is the 
conjunction of the following formulas for 1 < i < n: 

3"*' a; : sph^(a;) if t j < m 
3-"*x : sph^(a:) if U — m . 

n 

Note that Lp is satisfiable if and only if the disjunction ^/; is not empty. Hence an effective 
construction of ifj would allow us to decide satisfiability of first-order formulas in structures 
of degree < / which is not possible [25] . 

We now turn to finite structures. Clearly, the disjunction as in the above corollary 
is also equivalent to for all finite structures of degree < /. But in this context, we can 
also define another disjunction ipfi^^ by taking only those t G {0, 1, . . . , rn}" for which there 
exists a finite structure A of degree < / with A \= if and t = t-^ (cf., e.g., [18, page 
101]). As above, an effective construction of ■i/^fin would allow us to decide satisfiability of 
first-order formulas in finite structures of degree < / which, again, is not possible [25] . 

Despite the fact that the proof of Cor. 2.3 is not constructive, Seese showed that some 
sentence if) as required in Cor. 2.3 can be computed. 

Theorem 2.4 (Seese [22, page 523]). From a sentence </? and / G N, one can compute 
an f -equivalent sentence in Hanf normal form. 

Proof. Let j3 express that a structure has degree < /. Then search for a tautology of 
the form /5 -0) where is a sentence in Hanf normal form. Since the set of 

tautologies is recursively enumerable, we can do this search effectively. And since we know 
from Theorem 2.2 that an /-equivalent sentence in Hanf normal form exists, this search 
will eventually terminate successfully. □ 

Note that Seese's procedure to compute is not primitive recursive. A primitive re- 
cursive construction of a Hanf normal form was described by Durand and Grandjean [5] 
and independently by Lindell [19]. They present a quantifier elimination scheme and do 
not rest their reasoning on Ehrenfeucht-Prai'sse-games. But so far, no elementary upper 
bound for the running time of their algorithm is known. The main result of this paper is 
an elementary procedure for the computation of a Hanf normal form. This is achived by a 
new (direct) proof of Corollary 2.3 that does not use games. 

The effective constructions of Hanf normal forms led Seese [22], Durand and Grand- 
jean [5] and Lindell [19] to efficient algorithms for the evaluation of first-order queries on 
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structures of bounded degree. Seese showed that sentences in Hanf normal form can be 
evaluated in time linear in the structure and the Hanf normal form. Consequently, the set 
of pairs {A, <fi) with A a structure of degree < / and (p a sentence with A \^ <fi can be 
decided in time 

f) + 92{\^\, f) ■ \A\ (2) 

where /) is the time needed to compute the Hanf normal form and g2{\'-f\. f) is its 

size^ (it can be shown that the function g2 is elementary since the radiuses appearing in 
the Hanf normal form can be bound). Since Seese's construction is not primitive recursive, 
the function gi is not primitive recursive. The constructions by Durand and Grandjean and 
by Lindell show that gi can be replaced by a primitive recursive function g[. Since they 
get another Hanf normal form, also the function g2 changes to g'2, say (but as for Seese's 
Hanf normal form, also this function is elementary). 

In addition, Durand and Grandjean and Lindell show that the set of tuples a from A 
with A \= '^{a) can be computed in time 

g[{\^\,f) + g',{\^\J) .{\A\ + \{a \ A h V^(a)}|) (3) 

where (/? is a first-order formula and / is the degree of the structure A. Since g'2 is elementary, 
this result is of particular importance if this set has to be computed for many structures A 
and a fixed formula (f. This was recently improved by Kazana and Segoufin who compute 
this set in time 

2'^""^' .{\A\ + \{a\A^^m)- 
Here, the triply exponential factor originates from the work by Frick and Grohe [9] and 
the summand g[ is avoided since they do not precompute a Hanf normal form. Our result 
in this paper will show that the Hanf normal form can be computed in triply exponential 
time. Consequently, the functions from (2) and from (3) can be replaced by triply expo- 
nential functions. As a result, the model checking algorithm by Seese and the enumeration 
algorithm by Durand and Grandjean and by Lindell perform as well as the algorithms by 
Frick and Grohe and by Kazana and Segoufin, resp. 



3 Construction of a Hanf normal form 

A Hanf formula with free variables from formula of the form 

where r is a sphere with n + 1 centers. A formula is in Hanf normal form if it is a Boolean 
combination of Hanf formulas. 

Theorem 3.1. From a formula^ with free variables among x and f > 1, one can construct 
an f -equivalent formula ^ in Hanf normal form. This construction can be carried out in 
time 

2^ 



It should be noted that Frick and Grohe proved this problem to be solvable with gi the identity and triply 
exponential in ji^l and / [9]. 
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The construction of ^ from <P will be done by induction on the construction of The 
central part in this induction is described by the following lemma (the proof of Theorem 3.1 
can be found at the end of this section) . 

Lemma 3.2. From a formula in Hanf normal form with free variables among x,Xn+i 
and / > 1, one can construct a formula in Hanf normal form with free variables in x 
such that 3x,„_|_i : if and are f -equivalent. This construction can be carried out in time 
\(p\ .2"°'^^ /'^*''' where d is the maximal radius of a sphere appearing in (p. Furthermore, the 
largest radius appearing in '4> is 3d. 

Proof. Set e = 3d. The formula ifj will be a disjunction with one disjunct for every e-sphere 
t' with n + 1 centers. This disjunct will have the form 



We next describe how is obtained from ip. For this, let a = 3-"^Xn+2 '■ sph^ be some 
Hanf formula appearing in p. This formula will be replaced by the Hanf formula a' that 
we construct next. In this construction, we distinguish two cases, namely whether the 
(i(T)-sphere around Cn+iCn+2 in t is connected or not. 

(^) 'S'J(^)(cn+iC„+2) is connected. 

Let p denote the number of elements c e i?2d(^)(cn+i) with 



This finishes the construction of ipr' and therefore of the disjunction ijj. Clearly, ijj is in 
Hanf normal form. 

Now let a„+i G A with 5*^(00^+1) = r'. We will show 



ipr' = (Pt' a 3-^Xn+i : sph^, . 



Sa{ccn+ic) 




In this case, set 



a' = B^'^+Px, 



■n+2 ■■ sph^(a;,a;„+2) • 




again distinguishing the two cases above. 
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(a) First let S'^^^-^{cn+iCn+2) be connected. Then, for an+2 G -4 with S'^^-|(aa„+ia,„+2) — t, 
we have dist'^(a„+i, an+2) = dist''(c„+i, Cn+2) ^ 2(i(r) — 1 < 2d{T) and therefore a„+2 G 
^M(r)(«n+i)- Hence 

|{a„+2 e «4 I 5'^^)(aan+ia„+2) — "T"}! = |{an+2 e B^d{T)i^n+l) I 'S'^r)(^'^n+l^n+2) — "T"}! 

= |{c e S^d(^)(c„+i) I 5^^)(cc„+ic) = r}| =p 

where the last equality follows from 5'^(aa„+i) = r' and e > 3(i(r). Hence we showed 

{A,aan+i) h « {A,aan+i) h 3-"'xn+2 : sph^{x,Xn+i) 

<(=^ p> m 
<^ a) ^ a' . 

(b) Next consider the case that SJi(^^^{cn+iCn+2) is not connected. Then, for an+2 £ A, we 
have 5'^^^(aan+ian+2) = r if and only if 

dist-^(a„+i,a„+2) > 2d(r) and 5'^^)(aa„+2) = cr . 

But this implies 

|{a„+2 e A |5'^^)(aa„+ia„+2) = t}| 

= |{a„+2 e ^ I dist-^(a„+i,a„+2) > 2d(r), 5'^^)(aa„+2) = cr}] 
= \{an+2 e ^ I -5^^)(aa„+2) = 

- \{an+2 e A I dist-^(an+i,an+2) < 2(i(r),S'^^)(aan+2) = cr}\ 
^\{an+2eA\ S^^^){aan+2)='J}\ 

- \{c G r' I dist^'(c„+i,c) < 2d{T),S}[^^{cc) = a}\ 



\{an+2 e A I 5'^^)(aa„+2) = cx}| - p. 



Hence 



{A, dttn+i) \= a {A, aan+i) \= 3-"'xn+2 ■ sph^ 

|{a„+2 G ^ I 'S'^r)(^«n+l«n+2) = ^}| > 

I {an+2 e .4, I S'^^)(aan+2) = ct}| >m + p 
{A, a) ^a' . 

We next evaluate the size of the formula ip. Since is a disjunction of formulas ip-r', we 
first fix some e-sphere r' with n + 1 centers (with e = 3(i). Then r' has < /^^~^ ■ (n + 1) 
elements. Hence the formula sph^, has size < ■ (n + (the constant 0(1) 

depends on the signature L. Now we deal with the formula (/?^/. It results from by the 
replacement of subformulas of the form a — 3-"*x„2 : sph^. In the first case, \a'\ < \a\. In 
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the second case, note that a is a subsphere of r, so |sph^| < |sph^| < \a\. Furthermore, 
p < J2d(r)-i < pd-1 Note that the formula 

a' = B-'^+^Xnj : sph^(x,x„+2) 

is shorthand for 

2/2, • • • , Vm+p ■■ /\ ?/i 7^ A 11 Y y = yA^ sph^(x, y) J . (4) 

l<i<j<m+p \ \l<i<m+p / / 

The size of this formula is bounded by 

0{p') + \spK\ < 0(f'-') + \a\ . 
Since (fr' is obtained from (p by at most |(/?| replacements, we obtain 

M = \ip\ ■ 0{f'-') + \^\ < \^\ ■ f^"^ 

and therefore 

A : sph,,| = /«W ■ (Iv^l + n^(^)) . 

The number of disjuncts of ijj equals the number of 3(i-spheres with n + 2 centers. Since 
any such sphere has at most p'^~^{n + l) elements, the number of these spheres is bounded 

by 

which finally results in 

< ^nOi^^fOi^^ . fOid) . ^1^1 ^ ^O(l)) < ^nOi^yfOi^) 

We finally come to the evaluation of the time needed to compute i/j. The crucial point 
in our estimation is the time needed to compute the numbers p in (a) and (b); we only 
discuss (a). 

There are < /2(i{T)+i _ candidates c in -B2d(r)('^"'+i)- them, we have to 

compute the set B^' (c) (which can be done in time /^d+i _ i). Then, isomorphism of r 
and Sj^{cCn+ic) has to be decided. But these are two structures of degree < / and of size 
{n+2) ■ (f'^'^^ — 1) < |<^|-(/'^"'"^ — 1). Hence, by [21], this isomorphism test can be performed in 
time polynomial in the size of the structures (the degree of the polynomial depends on /). 
Hence, the number p can indeed be computed within the given time bound. □ 

We now come to the proof of the central result of this paper: 

Proof (of Theorem 3.1). The proof is carried out by induction on the construction of the 
formula <P. So first, let ip he a quantifier- free subformula of ^ whose free variables are 
among xi, . . . , x„. Let T be the set of all 1-spheres r of degree < / with n+1 centers such 
that the constants ci, . . . , c„ of r satisfy ip. Then set 

^ = V 3-^Xn+i : sp\ . 

reT 



Note that any 1-sphere with n + 1 centers has precisely n+1 elements. Furthermore, n < \<P\ 
since 99 is a subformula of Hence the formula sph^ has size n'-^^^^ < |^|*^(^) and there are 
2 1*1° ^ disjuncts in the formula tp (where the constants 0(1) depend on the signature L), 
i.e., |V'| = 2l*l°™. 

We now come to the induction step. The computation of Hanf normal forms of -xp and 
of (/? V y?' are straightforward from Hanf normal forms of (f and The only critical point 
in the induction are subformulas of the form : By the induction hypothesis, ^ can 

be transformed into an /-equivalent Hanf normal form ip and then Lemma 3.2 is invoked 
yielding an /-equivalent Hanf normal form for 3xn+i : /3. We have to invoke Lemma 3.2 at 
most \$\ times where the number n is always bounded by \$\. Each invokation increases 
the radius of the spheres considered by a factor of three, so the maximal radius will be 
gl'j'l _ 20(|<?'|)^ Hence, each invokation of Lemma 3.2 increases the formula by a factor of 

2' . Putting this to the power oi\<P\ does not change the expression. □ 
4 Optimality 

In this section, we give a matching lower bound for the size of an /-equivalent formula in 
Hanf normal form. Namely, we prove 

Theorem 4.1. There is a family of sentences (Xn)neN such that \xn\ £ 0{n) and every 
3-equivalent formula ipn in Hanf normal form has size at least 2 

The formulas Xn will speak about labeled trees. More formally, our signature L con- 
sists of two binary relations 5*0 and 5*1 and one unary relation U. A structure A — 
{A, S^, Sf, U-^) over this signature is a tree if there is X C {0, 1}* finite, nonempty and 
prefix-closed such that 

A = (X, {{u, uO) I uO e X}, {{u, ul) \uleX},H) 

for some H C X. The tree is complete if every inner node has two children and any two 
maximal paths have the same length, this length is called the height of the tree (i.e., 
X — 2-^ where h is the height). A forest is a disjoint union of trees. As in [9, Lemma 23], 
one can construct formulas Xn of size 0{n) such that for any forest A, we have 

A\= Xn and only if any two complete trees of height 2" in A are non-isomorphic. 
Lemma 4.2. Let ip be a formula in Hanf normal form that is 3-equivalent to Xn- Then 

\M > 2''^^'-\ 

Proof. Suppose j'^l < 2^^ Let M be the maximal number m such that 3-"^x : sph^ 
appears in ip (for any sphere a). We can assume that ip does not contain any formula sph^ 
where cr is a 1-spherc. The complete tree of height 2" has 2^"+^ — 1 nodes. Hence there arc 
2 ~ ways to color such a tree. Since we assume < 2 ~ , there is one such tree B 
(with root r) such that the formula sph^^ does not appear in ip. 
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Next, wc need a bit of terminology. If ^ is a tree, a a node in r, and d G N, then 
also r \ Bf{a) is a tree that we denote Nf{a). Recall that the sphere S;^{a) = {Nf{a), a) 
about a of radius d has an additional constant. 

Now we define a structure ^o- It consists of M + 1 copies of any of the structures N^{b) 
where 

1. 1 < d < 2" and b is not the root of B or 

2. d< 2**. 

Finally, let A2 — Ao^ B be the disjoint union of ^0 and two copies of the tree B. 
Since Aq does not contain any complete tree of height 2", we get ^0 1= Xn and therefore 
Ao 1= "0- Note that any sphere realized in ^0 or A2 is also realized in B. So let b & B, and 
d G N. We distinguish several cases: 

1. 1 < d < 2" and b is not the root of B. Then the sphere {N^{b),b) is realized in ^0 more 
than M times, hence the same holds for A2- 

2. d < 2". Then {N^{b),b) is realized in Aq more than M times, hence the same holds 
for A2. 

3. b is the root of the tree B and d = 2". Then N^{b) = B. Hence S^{b) is not realized 
in ^0 and it is realized twice in A2- But validity of ip does not depend on this number 
since ip does not mention the formula sph^g^-j. 

Hence, we obtain A2 \= ip, contrary to our assumption that Xn and ■0 are 3-equivalent. □ 
The theorem now follows immediately from this lemma. 
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